Phrase Extraction for Japanese Predictive Input Method as Post-Processing

نویسنده

  • Yoh Okuno
چکیده

We propose a novel phrase extraction system to generate a phrase dictionary for predictive input methods from a large corpus. This system extracts phrases after counting n-grams so that it can be easily maintained, tuned, and re-executed independently. We developed a rule-based filter based on part-of-speech (POS) patterns to extract Japanese phrases. Our experiment shows usefulness of our system, which achieved a precision of 0.90 and a recall of 0.81, outperforming the N-gram baseline by a large margin.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Semantic Sequence Extraction from Unrestricted Non-Tagged Texts

Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based approaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain m a w domain-specific terms, because of the lack of vocabulary. In this paper we propose a simpl...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Post-Processing of Stream Flows in Switzerland with an Emphasis on Low Flows and Floods

Abstract: Post-processing has received much attention during the last couple of years within the hydrological community, and many different methods have been developed and tested, especially in the field of flood forecasting. Apart from the different meanings of the phrase “post-processing” in meteorology and hydrology, in this paper, it is regarded as a method to correct model outputs (predict...

متن کامل

Do Heavy-NP Shift Phenomenon and Constituent Ordering in English Cause Sentence Processing Difficulty for EFL Learners?

Heavy-NP shift occurs when speakers prefer placing lengthy or “heavy” noun phrase direct objects in the clause-final position within a sentence rather than in the post-verbal position. Two experiments were conducted in this study, and their results suggested that having a long noun phrase affected the ordering of constituents (the noun phrase and prepositional phrase) by advanced Iranian EFL le...

متن کامل

A Polynomial - Order Algorithm For Optimal Phrase Sequence Selection From A Phrase Lattice And Its Parallel Layered Implementation

This paper deals with a problem of selecting an optimal phrase sequence from a phrase lattice, which is often encountered in language processing such as word processing and post-processing for speech recognition. The problem is formulated as one of combina-torial optimization, and a polynomial order algorithm is derived. This algorithm finds an optimal phrase sequence and its dependency structu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011